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Abstract. Wc consider unconstrained randomized optimization of convex objective functions. 
We analyze the Random Pursuit algorithm, which iteratively computes an approximate solution to 
the optimization problem by repeated optimization over a randomly chosen one-dimensional sub- 
space. This randomized method only uses zeroth-order information about the objective function and 
does not need any problem-specific parametrization. We prove convergence and give convergence 
rates for smooth objectives assuming that the one-dimensional optimization can be solved exactly 
or approximately by an oracle. A convenient property of Random Pursuit is its invariance under 
strictly monotone transformations of the objective function. It thus enjoys identical convergence 
behavior on a wider function class. To support the theoretical results we present extensive numerical 
performance results of Random Pursuit, two gradient-free algorithms recently proposed by Nesterov, 
and a classical adaptive step-size random search scheme. We also present an accelerated heuristic 
version of the Random Pursuit algorithm which significantly improves standard Random Pursuit on 
all numerical benchmark problems. A general comparison of the experimental results reveals that (i) 
standard Random Pursuit is effective on strongly convex functions with moderate condition number, 
and (ii) the accelerated scheme is comparable to Nesterov's fast gradient method and outperforms 
adaptive step-size strategies. 
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1. Introduction. Randomized zeroth-order optimization schemes were among 
the first algorithms proposed to numerically solve unconstrained optimization prob- 
lems [IJ[()1I3I]. These methods are usually easy to implement, do not require gradient 
or Hessian information about the objective function, and comprise a randomized 
mechanism to iteratively generate new candidate solutions. In many areas of mod- 
ern science and engineering such methods are indispensable in the simulation (or 
black-box) optimization context, where higher-order information about the simula- 
tion output is not available or does not exist. Compared to deterministic zeroth-order 
algorithms such as direct search methods |21j or interpolation methods [5] randomized 
schemes often show faster and more robust performance on ill-conditioned benchmark 
problems [2] and certain real-world applications such as quantum control [5] and pa- 
rameter estimation in systems biology networks |39j . While probabilistic convergence 
guarantees even for non-convex objectives are readily available for many randomized 
algorithms [41] . provable convergence rates are often not known or unrcalistically 
slow. Notable exceptions can be found in the literature on adaptive step size random 
search (also known as Evolution Strategies) |H E], on Markov chain methods for 
volume estimation, rounding, and optimization [40] . and in Nesterov's recent work on 
complexity bounds for gradient-free convex optimization |29j . 

Although Nesterov's algorithms are termed "gradient-free" their working mecha- 
nism does, in fact, rely on approximate directional derivatives that have to be available 
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via a suitable oracle. We here relax this requirement and investigate a true random- 
ized gradient- and derivative-free optimization algorithm: Random Pursuit (TZVp). 
The method comprises two very elementary primitives: a random direction genera- 
tor and an (approximate) line search routine. We establish theoretical performance 
bounds of this algorithm for the unconstrained convex minimization problem 

mwf(x) subject to ieR", (1.1) 

where / is a smooth convex function. We assume that there is a global minimum and 
that the curvature of the function / can bounded by a constant. Each iteration of 
Random Pursuit consists of two steps: A random direction is sampled uniformly at 
random from the unit sphere. The next iterate is chosen such as to (approximately) 
minimize the objective function along this direction. This method ranges among the 
simplest possible optimization schemes as it solely relies on two easy-to-implement 
primitives: a random direction generator and an (approximate) one-dimensional line 
search. A convenient feature of the algorithm is that it inherits the invariance un- 
der strictly monotone transformations of the objective function from the line search 
oracle. The algorithm thus enjoys convergence guarantees even for non-convex objec- 
tive functions that can be transformed into convex objectives via a suitable strictly 
monotone transformation. 

Although Random Pursuit is fully gradient- and derivative-free, it can still be un- 
derstood from the perspective of the classical gradient method. The gradient method 
(GAi) is an iterative algorithm where the current approximate solution x k £ M. n is 
improved along the direction of the negative gradient with some step size X k : 

x fe+ i = x k + X k (-Vf(x k )). (1.2) 

When the descent direction is replaced by a random vector the generic scheme reads 

Xk+i = x k + X k u, (1.3) 

where u is a random vector distributed uniformly over the unit sphere. A crucial 
aspect of the performance of this randomized scheme is the determination of the step 
size. Rastrigin |34] studied the convergence of this scheme on quadratic functions 
for fixed step sizes X k where only improving steps are accepted. Many authors ob- 
served that variable step size methods yield faster convergence (24] [18| . Schumer and 
Steiglitz [36] were among the first to develop an effective step size adaptation rule 
which is based on the maximization of the expected one-step progress on the sphere 
function. A similar analysis has been independently obtained by Rechenberg for the 
(l+l)-Evolution Strategy (£S) [35] ■ Mutseniyeks and Rastrigin proposed to choose 
the step size such as to minimize the function value along the random direction [2*5] . 
This algorithm is identical to Random Pursuit with an exact line search. Conver- 
gence analyses on strongly convex functions have been provided by Krutikov |22j and 
Rappl [33]. Rappl proved linear convergence of TZV^ without giving exact conver- 
gence rates. Krutikov showed linear convergence in the special case where the search 
directions are given by n linearly independent vectors which are used in cyclic order. 

Karmanov [HI |TT1 02] already conducted an analysis of Random Pursuit on gen- 
eral convex functions. Thus far, Karmanov's work has not been recognized by the 
optimization community but his results are very close to the work presented here. We 
enhance Karmanov's results in a number of ways: (i) we prove expected convergence 
rates also under approximate line search; (ii) we show that continuous sampling from 
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the unit sphere can be replaced with discrete sampling from the set {±e^ : i = 1, . . . , n} 
of signed unit vectors, without changing the expected convergence rates; (iii) we pro- 
vide a large number of experimental results, showing that Random Pursuit is a com- 
petitive algorithm in practice; (iv) we introduce a heuristic improvement of Random 
Pursuit that is even faster on all our benchmark functions; (v) we point out that 
Random Pursuit can also be applied to a number of relevant non-convex functions, 
without sacrificing any theoretical and practical performance guarantees. On the 
other hand, while we prove fast convergence only in expectation, Karmanov's more 
intricate analysis also yields fast convergence with high probability. 

Polyak (3TJ describes step size rules for the closely related randomized gradient 
descent scheme: 

f(x k + n k u) - f(x k ) 
x k +i=x k + \ k it, (1.4) 

where convergence is proved for fj, k — > but no convergence rates are established. 
Nesterov [29 studied different variants of method (1.4) and its accelerated versions 
for smooth and non-smooth optimization problems. He showed that scheme (1.4 1 is 
at most 0(n 2 ) times slower than the standard (sub-)gradient method. The use of 
exact directional derivatives reduces the gap further to 0(n). For smooth problems 
the method is only 0(n) slower than the standard gradient method and accelerated 
versions are 0(n 2 ) slower than fast gradient methods. 

Kleiner ct al. 20 studied a variant of algorithm (1.3 1 for unconstrained semidefi- 
nite programming: Random Conic Pursuit. There, each iteration comprises two steps: 
(i) the algorithm samples a rank-one matrix (not necessarily uniformly) at random; (ii) 
a two-dimensional optimization problem is solved that consists of finding the optimal 
linear combination of the rank-one matrix and the current semidefinite matrix. The 
solution determines the next iterate of the algorithm. In the case of trace-constrained 
semidefinite problems only a one-dimensional line search is necessary. Kleiner and co- 
workers proved convergence of this algorithm when directions are chosen uniformly at 
random. The dependency between convergence rate and dimension are, however, not 
known. Nonetheless, their work greatly inspired our own efforts which is also reflected 
in the name "Random Pursuit" for the algorithm under study. 

The present article is structured as follows. In Section [2] we present the Random 
Pursuit algorithm with approximate line search. We introduce the necessary notation 
and formulate the assumptions on the objective function. In Section [3] we derive a 
number of useful results on the expectation of scaled random vectors. In Section |4] we 
calculate the expected one-step progress of Random Pursuit with approximate line 
search (TZV^). We show that (besides some additive error term) this progress is by a 
factor of 0(n) worse than the one-step progress of the gradient method. These results 
allow us to derive the final convergence results in Section [5j We show that IZV^ meets 
the convergence rates of the standard gradient method up to a factor of 0(n), i.e., 
linear convergence on strongly convex functions and convergence rate 1/k for general 
convex functions. The linear convergence on strongly convex functions is best possible: 
For the sphere function our method meets the lower bound |15j . For strongly convex 
objective functions the method is robust against small absolute or relative errors in the 
line search. In Section [6] we present numerical experiments on selected test problems. 
We compare TZV^ with a fixed step size gradient method and a gradient scheme 
with line search, Nesterov's random gradient scheme and its accelerated version [29 , 
an adaptive step size random search, and an accelerated heuristic version of IZV^. 
In Section [7] we discuss the theoretical and numerical results as well as the present 
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limitations of the scheme that may be alleviated by more elaborate randomization 
primitives. We also provide a number of promising future research directions. 



2. The Random Pursuit (RP) Algorithm. We consider problem (1.1 ) where 
/ is a diffcrcntiable convex function with bounded curvature (to be defined below). 
The algorithm TZV^ is a variant of scheme ( |1.3[ ) where the step sizes are determined 
by a line search. Formally, we define the following oracles: 

Definition 2.1 (Line search oracle). For x G W L , a convex function f, and a 
direction u G S^ 1 = {y G R n : \\y\\ 2 = 1}, a function LS : E" x -> R with 

LSfx, u) € axgmin/fa; + hu) (2.1) 

is called an exact line search oracle. (Here, the argmin is not assumed to be unique, 
so we consider it as a set from which hS(x,u) selects a well-defined element.) For 
accuracy n > the functions LSapprox^ 6 ' and LSAPPROxjf s with 

LS(x,u) - fi < LSAPPROX^ s (a;,w) < LS(x, u) + fi, and, (2.2) 
s ■ max{0, (1 - fj,)} ■ LS(x, u) < s ■ LSapprox^z, u) < s ■ LS(x, u), (2.3) 

where s = sign(LS(a;, u)), are, respectively, absolute and relative, approximate line 
search oracles. By LSapprox^,, we denote any of the two. 

This means that we allow an inexact line search to return a value h close to an 
optimal value h* = LS(cc, u). To simplify subsequent calculations, we also require that 



h < h* in the case of relative approximation (cf. (2.3)), but this requirement is not 



essential. As the optimization problem (2.1) cannot be solved exactly in most cases 



we will describe and analyze our algorithm by means of the two latter approximation 
routines. 

The formal definition of algorithm 7ZV M is shown in Algorithm^ At iteration k 
of the algorithm a direction u G S 11 ^ 1 is chosen uniformly at random and the next 
iterate Xu+i is calculated from the current iterate Xfc as 

Xk+i '■= Xk + LSapprdx /j (x / ! £ , u) ■ u. (2-4) 



Algorithm 1 Random Pursuit (TZV^ 



A problem of the form ( |1.1| ) N G N : number of iterations 
' xo : an initial iterate /i > : line search accuracy 

Output: Approximate solution xn to jTTJ. 



for k <- to N - 1 do 

choose itfe uniformly at random from S n ' 
Set Xk+i <— x k + LSAPPROX M (a;fc, u k ) ■ u k 

end for 

return xn 



This algorithm only requires function evaluations if the line search LSapprox^, 
is implemented appropriately (see [9] and references therein). No additional first or 
second-order information of the objective is needed. Note also that besides the start- 
ing point no further input parameters describing function properties (e.g. curvature 
constant, etc.) are necessary. The actual run time will, however, depend on the 
specific properties of the objective function. 
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2.1. Discrete Sampling. As our analysis below reveals, the random vector 
Mfc enters the analysis only in ter ms o f expectations of the form E[(x, Uk) Uk] and 



x, Uk) Ufc|j 2 ]. In Lemmas 3.3 and 3.4 we show that these expectations are the same 



for Uj, ~ S 11 ^ 1 and u^ ~ {±ej : i = 1, . . . , n}, the set of signed unit vectors (here and 
in the following, the notation x ~ A for a set A, denotes that x is distributed according 
to the uniform distribution on A). It follows that continuous sampling from S n ~ l can 
be replaced with discrete sampling from {ie^ : i = 1, ... ,n} without affecting our 
guarantees on the expected runtime. Under this modification, fast convergence still 
holds with high probability, but the bounds get worse [T7] . 

2.2. Quasiconvex Functions. If / and g are functions, g is called a strictly 
monotone transformation of / if 

f(x)<f(y) g(f(x)) < g(f(y)), x,y eR n . 

This implies that the distribution of in TST-^ is the same for the function / and 
the function g o /, if g is a strictly monotone transformation of /. This follows from 



the fact that the result of the line search given in Definition 2.1 is invariant under 
strictly monotone transformations. 

This observation allows us to run 1ZV M on any strictly monotone transformation 
of any convex function /, with the same theoretical performance as on / itself. The 
functions obtainable in this way form a subclass of the class of quasiconvex functions, 
and they include some non-convex functions as well. In Section |6.2.3| we will exper- 
imentally verify the invariance of TZV M under strictly monotone transformations on 
one instance of a quasiconvex function. 

2.3. Function Basics. We now introduce some important inequalities that are 
useful for the subsequent analysis. We always assume that the objective function is 
differentiable and convex. The latter property is equivalent to 

f(y)>f(x) + (Vf(x),y-x), x,y £ R n . (2.5) 

We also require that the curvature of / is bounded. By this we mean that for some 
constant L\, 

f(y)-f(x)-(Vf(x),y-x)<^L 1 \\x-yf, x,yeR n . (2.6) 

We will also refer to this inequality as the quadratic upper bound. It means that the 
deviation of / from any of its linear approximations can be bounded by a quadratic 
function. We denote by C\ the class of differentiable and convex functions for with 
the quadratic upper bound holds with the constant L%. 

A differentiable function is strongly convex with parameter to if the quadratic 
lower bound 

f( y )-f( x )-(Vf(x),y-x)> r ^\\y-xf, x,y eR n , (2.7) 
holds. Let x* be the unique minimizer of a strongly convex function / with parameter 



to. Then equation (2.7) implies this useful relation: 



™\\x-x*\f<f(x)-f(x*)< ^l|V/(ar)|| 2 , Vx e R" . (2.. 
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The former inequality uses Vf{x*) — 0, and the latter one follows from (2.7) via 



Six*) > f(x) + (Vf(x),x* -x) + ~ \\x* - x\\ 2 

> f{x) + min ({Vf(x),y- x) + ^ \\y - x\\ 2 ) = f(x) - Vf(x)\\ 2 
!/eR™ V 2 / 2m 

by standard calculus. 

3. Expectation of Scaled Random Vectors. We now study the projection of 
a fixed vector x onto a random vector u. This will help analyze the expected progress 
of Algorithm [l] We start with the case u ~ Af{0, I n ) and then extend it to u ~ S 1 " -1 . 
Throughout this section, let x € K™ be a fixed vector and u € K™ a random vector 
drawn according to some distribution. We will need the following facts about the 
moments of the standard normal distribution. 

Lemma 3.1. 

(i) Let v ~ jV(0, 1) he drawn from the standard normal distribution over the 
reals. Then 



E[u] = E[v 3 } = 0. 



E[^ 2 ] = 1 , 



E[z/] = 3 . 



(ii) Let u ~ Af(0,I n ) be drawn from the standard normal distribution over 
Then 



E u [UU T ] = I n . 



E u [{u T u)uu T ] = (n + 2)J„. 



Proof. Part (i) is standard, and the latter two matrix equations easily follow from 
(i) via 



(uU T )ij = UiUj , 



{{u T u)uu T ) i: . = UiUj ^2u 2 k . □ 



Lemma 3.2 (Normal distribution). Let u ~ ^(0, I n ) . Then 

E u [(x,u) u] — x , and E u \\(x,u)u\\ 2 =(n + 2)||x| 



Proof. We calculate 

E u [{x, u) u] = E u [uu T x] = E u [uu T ]x = x, 
by Lemma |3.1[ ii). For the second moment we get 

E u \\{x,u}u\\ 2 =E u [x T (u T u)uu T x] — x T E u [(u T u)uu T )x = (n + 2) \\x\\ 2 , 

again using Lemma |3.1[ ii). □ 

Lemma 3.3 (Spherical distribution). Let u ~ . Then 



1 r 2 

E u [\x, u) u] = — x , and E u \\(x,u)u\\ 
n L 



(x,uY 



- Il-ll 2 

n 



Proof. Let v ~ J\f(0,I n ). We observe that the random vector w — vj ||u|| has the 
same distribution as u. In particular, 



E u [uu T ] = E v 



E v [vv T ] _ I n 



E„ \\v\\ 



(3.1) 
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where we have used that the two random variables and \\v\\ are independent 
(see [H]), along with 



E„ \vv T ] 



E, 



a consequence of Lemma 3.1 Now we use (3.1 ) to compute 

E„ [(x, u) u] = E u \uu T ] x = —x = —x 

n n 



and 



(x,it) 2 = E u [x T uu T x] — x T E n [uu T ] x = ~' 



x — x = — x 
n n 



□ 



The same result can be derived when the vector u is chosen to be a random signed 
unit vector. 

Lemma 3.4. Let u ^ U := {±e; : i = 1, . . . ,n} where ei denotes the i-th standard 
unit vector in M™ . Then 



1 r 2 

E u [(x, u) u] = — x , and E u \\(x, u) u\\ 



(x,u)' 



1 



Proof. We calculate 

1 1 - 1 

E M [(x, u) u] — — (x, u) u = - Xi€i = —. 

and similarly 

E„. 



2n 

4. Single Step Progress. To prepare the convergence proof of Algorithm VSP ^ 
in the next section, we study the expected progress in a single step, which is the 
quantity 

E[/(x fe+1 ) |x fe ] . 

It turns out that we need to proceed differently, depending on whether the function 
under consideration is strongly convex (the easier case) or not. We start with a 
preparatory lemma for both cases. We first analyze the case when an approximate 
line search with absolute error is applied. Using an approximate line search with 
relative error will be reduced to the case of an exact line search. 

4.1. Line Search with Absolute Error. 

Lemma 4.1 (Absolute Error). Let f G C\ and let Xk S R™ be the current iterate 
and Xk+i S K™ the next iterate generated by algorithm 1ZV '„ with absolute line search 
accuracy fi. For every positive ftgl and every point z £ K™ we have 

h L\h 2 2 Lip? 

E [/(xfc+i) | Xk] < f(xk) + - (V/(x fc ), z - Xk) + — — H-2 - x k \\ H . 

n 2n 2 
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Proof. Let x' k+1 := x k + hS(x k , u k )u k be the exact line search optimum. Here, 
u k £ S*™ -1 is the chosen search direction. By definition of the approximate line 
search (2.2), we have 



/Ofc+i) < max + vu k ) 

\"\<M 



f(x' k+1 ) + max 



( V /(4+iW)+^ 2 



/(4+i) + 



L^ 2 



(4.1) 



where we used the quadratic upper bound (2.6 1 in the second inequality with x = x' k+1 
and y = x' k+1 + vu k . 

Since x' k+1 is the exact line search optimum, we in particular have 



L t 2 

/ K+i) < f(x k + t k u k ) < f{x k ) + (Vf(x k ),t k u k ) + 



Vi fc € 



(4.2) 



where we have applied (|2.6|) a second time. Putting together (|4.1|) and (4.2), and 

L^ 2 



taking expectations, we get 

[f(xk+i) | Xk] < f(x k ) 



(^.f(Xk),t k U k ) 



Lit 2 



Xk 



(4.3) 



Now it is time to choose t k such that we can control the expectations on the right-hand 
side. We set 

t k := h(z- x k ,u k ) , 

where h > and z 6 R™ are the "free parameters" of the lemma. Via Lemma [3.3| 
this entails 

h h 2 
E« fc [tkU k ] = -(z - x k ) , E Uk [t 2 k ] = —\\z - x k \\ , 

and the lemma follows. □ 

4.2. Line Search with Relative Error. In the case of relative line search 
error, we can prove a variant of Lemma 4.1 with a denominator n' slightly larger than 
n. As a result, the analysis under relative line search error reduces to the analysis of 
exact line search (approximate line search error 0) in a slightly higher dimension; in 
the sequel, we will therefore only deal with absolute line search error. 

Lemma 4.2 (Relative Error). Let f £ C\ and let Xk £ R™ be the current iterate 
and x k +\ £ R 11 the next iterate generated by algorithm TZV^ with relative line search 
accuracy fi. For every positive ftgl and every point z £ R™ we have 

E [f(x k+1 ) | a*] < f(x k ) + ^ (V/(ifc),2 - x k ) + ^ \\z - x k \\ 2 , 

n In 

where n! = n/(l — /i). 

Proof. By the definition (2.3) of relative line search error, Xk+i is a convex 
combination of x k and x k+1 , the exact line search optimum. More precisely, we can 
compute that 



Xk+i = (1 ~i)x k +74+1 



RANDOM PURSUIT 



9 



where 7 > 1 — ji. By convexity of /, we thus have 

f{x k +i) < (1 - l)f{xk) + 7/(4+i) < + (1 - , 

since /(a;^ +1 ) < f(x k ). Hence 

E [/(x fe+ i) I a*] < m/K) + (1 - M) E [/(4+i) I ifc] • (4-4) 
Using Lemma ITT] with absolute line search error yields a bound for the latter term: 



E[f(x' k+1 )\x k ] </( a;fc ) + -(V/( a ; fe ),z- a;fc ) + : ^lk-^|| 2 



Putting this together with (4.4 1 yields 



E [f(x k+1 ) I x fc ] < f{x k ) + (1 - n) (- (Vf(x k ),z - x k ) + \\z - x k \\ 2 

\n In 

and with n! = n/(l — (i), the lemma follows. □ 

4.3. Towards the Strongly Convex Case. Here we use z = x k — V f(x k ) in 
Lemma 14.11 

Corollary 4.3. Let f € C\ and let x k € K n be the current iterate and x k+ \ G 
W 1 the next iterate generated by algorithm 1ZV ^ with absolute line search accuracy \x. 
For any positive h k < ~ it holds that 

E lf(x k+1 ) I x k ] < f(x k ) - ^ ||V/(x fc )|| 2 1,1,2 



2n" v 2 



Proof. Lemma 4.1 with z = x k — V/(xfc) yields 



E [f(x k+1 ) I x k ] < f(x k ) - ^ (Vf(x k ),Vf(x k )) + ^* ||W(* fe )|| 2 + 
We conclude 

E [f(x k+1 ) I x k ] < f(x k ) - ^ (l - L p-) l|V/(^)|| 2 + . □ 



4.4. Towards the General Convex Case. For this case, we apply Lemma [4~T] 
with z = x* . 

Corollary 4.4. Let f e C\ and let x k £ K ra be the current iterate and x k +i G 
K n the next iterate generated by algorithm 1ZV M with absolute line search accuracy fx. 
Let x* € W 1 be one of the minimizers of the function f . For any positive h k > it 
holds that 

E {f(x k+1 ) - f{x*) I x k ] <{!--) (f(x k ) - f{x*)) + L ^k\\ x *- Xk \\ 2 + i|/r 



n ' v v ' v " 2n " 11 2 



Proof. We use Lemma 4.1 with z — x* and apply convexity ( |2.5[ ) to bound the 
term (V/(xfc), a;* — from above by f(x*) — f(x k ). Subtracting f{x*) from both 
sides yields the inequality of the corollary. □ 
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5. Convergence Results. Here we use the previously derived bounds on the 
expected single step progress (Corollaries 4.3 and 4.4) to show convergence of the 
algorithm. 

5.1. Convergence Analysis for Strongly Convex Functions. We first prove 
that algorithm TZV^ converges linearly in expectation on strongly convex functions. 
Despite strong convexity being a global property, it is sufficient if the function is 
strongly convex in the neighborhood of its minimizer (see Theorem 5.2). 

Theorem 5.1. Let f e C\ and let f be strongly convex with parameter m, and 
consider the sequence {xk}k>o generated by IZV^ with absolute line search accuracy 
\x. Then for any N > 0, we have 



E[f(x N )-f(x*)} < 1- 



Li n 



N 



(/(*„) -/(**)) + 



2m 



Proof. We use Corollary 



4.3 



with h 



1 and the quadratic lower bound to 



estimate the progress in one step as 



E [f(x k+1 ) f{x*) | x k ] < f{x k ) - f(x*) -i- \\Vf(x k )\\ 2 + ^ 



1 - 



nLi 



{f{x k )-f{x*)) + 



After taking expectations (over x k ), the tower property of conditional expectations 
yields the recurrence 



E[f(x k+1 )-f(x*)} < 1- 



nLi 



E[f(x k )-f(x*)} + 



This implies 



E[f(x N )-f(x*)} < 1- 



(f(x )-f(x*))+u(N) 



with 



N-l 

E 

i=0 



(1 



m y K Lin 
Lin ~ m 



The bound of the theorem follows. □ 

We remark that by strong convexity the error \\xn — x*\\ can also be bounded 
using the results of this theorem. Thus, the algorithm does not only converge in terms 
of function value, but also in terms of the solution itself. 

Each strongly convex function has a unique minimizer x* . Using the quadratic 
lower bound (2.8) wc recall that: 



f(x)~f(x*)>j\\x-x*\\ 2 . 



(5.1) 



It turns out that instead of strong convexity (2.7) the weaker condition (5.1) is suffi- 
cient to have linear convergence. 

Theorem 5.2. Let f G C\ and suppose f has a unique minimizer x* satisfy- 
ing (5.1) with parameter m. Consider the sequence {x k }k>o generated by IZV^ with 
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absolute line search accuracy [i. Then for any N > 0, we have 

E [f(x N ) - /(**)] <(l- -^-Y(f( Xo ) - /(**)) + 2Li "" 1 



Proof. To see this we use Corollary 4.4 with property (5.1) to get 



-U/< < A + I> " ./'(■/■ ' I -'"/.] ( 1 " ^) (/(**) ~ f{x*)) + ^ \\x k - X*f + 

1 1 ( t(xk) - fix H — . 



\ n mn J & 

Setting hk to > the term in the left bracket becomes (1 — 4 ™ n ). Now the proof 
continues as the proof of Theorem |5.1| □ 



5.2. Convergence Analysis for Convex Functions. We now prove that al- 
gorithm TZV/j, converges in expectation on smooth (not necessarily strongly) convex 
functions. The rate is, however, not linear anymore. 

Theorem 5.3. Let f £ C\ and let x* a minimizer of f, and let the sequence 
{xk}k>o be generated by IZV^ with absolute line search accuracy /i. Assume there 
exists R, s.t. \\y — x*\\ < R for all y with f(y) < f(xo). Then for any N > 0, we 
have 

E[f(x N )-f(x*)]<j^ n: + ^^, 

where 

Q = m^{2nL 1 R 2 J(x Q )~f(x*)} . 



Proof. By assumption, there exists an R € R, s.t. \\x k — x*\\ < R, for all k 
0, 1, . . . , N. With Corollary 4.4 it follows for any step size hk > 0: 



h k 



E [f{x k+l ) - /Or*) \x k ]<[l-^-j (f(x k ) - f(x*)) + , >n 
Taking expectation we obtain 



(5.2) 



1 ;/(, 7 ^ ,. » - /v ! 1] 1 1 1 ~ ~) E [/(**) - /(**)] + (~ , , 



nLiR 2 Ltu 2 



By setting h k := -^pj for k = 0, . . . , (AT — 1) we obtain a recurrence that is exactly of 
the form as treated in Lemma [A. II and the result follows. □ 

We note that for e > the exact algorithm TZVq needs O (-) steps to guarantee 



an approximation error of e. According to the discussion preceding Lemma |4.2| this 
still holds under an approximate line search with fixed relative error. 

In the absolute error model, however, the error bound of Theorem |5.3| becomes 
meaningless as N — > 00. Nevertheless, for N opt = y // 2Q/(Lifi 2 ) the bound yields 



E[f(x Nopt )-f(x*)}<^- + 



N, 



Opt 
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5.3. Remarks. We emphasize that the constant L\ and the strong-convexity 
parameter m which describe the behavior of the function are only needed for the 
theoretical analysis of IZV^,. These parameters are not input parameters to the algo- 
rithm. No pre-calculation or estimation of these parameters is thus needed in order 
to use the algorithm on convex functions. Moreover, the presented analysis does not 
need parameters that describe the properties of the function on the whole domain. It 
is sufficient to restrict our view on the sub-level set determined by the initial iterate. 
Consequently if the function parameters get better in a neighborhood of the opti- 
mum, the performance of the algorithm may be better than theoretically predicted 
by the worst-case analysis. 

6. Computational Experiments. We complement the presented theoretical 
analysis with extensive numerical optimization experiments on selected benchmark 
functions. We compare the performance of the 1ZV M algorithm with a number of 
gradient-free algorithms that share the simplicity of Random Pursuit in terms of 
the computational search primitives used. We also introduce a heuristic acceleration 
scheme for Random Pursuit, the accelerated VSP ^ method (ATZV^). As a generic 
reference we also consider two steepest descent schemes that use analytic gradient 
information. The test function set comprises two quadratic functions with different 
condition numbers, two variants of Nesterov's smooth function [28;, and a non-convex 
funnel-shaped function. We first detail the algorithms, their input requirements, and 
necessary parameter choices. We then present the definition of the test functions, 
describe the experimental performance evaluation protocol, and present the numerical 
results. Further experimental data are available in the supporting online material 38 
at http : //arxiv . org/abs/1111 . 0194. 

6.1. Algorithms. We now introduce the set of tested algorithms. All methods 
have been implemented in MATLAB. The source code is also publicly available in the 
supporting online material (38) - 

6.1.1. Random Gradient Methods. We consider two randomized methods 
that are discussed in detail in [2H] . The first algorithm, the Random Gradient Method 



implements the iterative scheme described in (1.4). A necessary ingredient for 
the algorithm is an oracle that provides directional derivatives. The accuracy of the 
directional derivatives is controlled by the finite difference step size \x. A pseudo- 
code representation of the approximate Random Gradient method (TZG^) along with 
a convergence proof is described in [2HJ Section 5]. We implemented TZQ M and used 
the parameter setting /i = IE — 5. A necessary input to the TZQ M algorithm is the 
function-dependent Lipschitz constant L\ that is used to determine the step size 
= + 4)ii). We also consider Nesterov's fast Random Gradient Method 

(J-G) [IS]- This algorithm simultaneously evolves two iterates in the search space 
where, in each iteration, a directional derivative is approximately computed at specific 
linear combinations of these points. In 29, Section 6] Nesterov provides a pseudo- 
code for the approximate scheme JFQ M and proves convergence on strongly convex 
functions. We implemented the TQ M scheme and used the parameter setting fi = 1e- 
5. Further necessary input parameters are both the L\ constant and the strong 
convexity parameter m of the respective test function. 

6.1.2. Random Pursuit Methods. In the implementation of the TZV^ algo- 
rithm we choose the sampling directions uniformly at random from the hypcrsphere. 
We use the built-in MATLAB routine f minunc . m from the optimization toolbox [35] 
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with optimset('TolX ,= /z) as approximate line search oracle LSapprox^ with fi = 1e- 
5. In the present gradient-free setting fminunc.m uses a mixed cubic/quadratic poly- 
nomial line search where the first three points bracketing the minimum are found by 
bisection [3"2"] . 

Inspired by the TQ scheme we also designed an accelerated Random Pursuit algo- 
rithm (ATZV^) which is summarized in Algorithm [2j The structure of this algorithm 



Algorithm 2 Accelerated Random Pursuit (ATZV M ) 
Input: N, xq, fi,m, Li 

1: = 7o >m,vo = x 

2: for k = to N do 

3: Compute f3 k > satisfying O' 1 ^ = (1 - f3 k )j k + /3 k m =: j k+1 . 
4: Set A fc = ^-m, S k = 7fc +^ m , and y fc = (1 - <J fc )a: fc + <5 fc u fe . 

6: Set x fe+1 = y k + LSAPPROX A1 (y jt ,u i .) ■ u fe . 

7: Set Ufc+1 = (1 - A fc )« fc + \ k y k + ""™M»»."*) Mfc . 

8: end for 



is similar to Nesterov's FGu scheme. In AlZV^ the step size calculation is, however, 
provided by the line search oracle. Although we currently lack theoretical guarantees 
for this scheme we here report the experimental performance results. Analogously 
to the FGfi algorithm, the accelerated IZV^ algorithm needs the function-dependent 
parameters L\ and m as necessary input. The line search oracle is identical to the 
one in standard Random Pursuit. 

6.1.3. Adaptive Step Size Random Search Methods. The previous ran- 
domized schemes proceed along random directions either by using pre-calculated step 
sizes or by using line search oracles. In adaptive step size random search methods 
the step size is dynamically controlled such as to approximately guarantee a certain 
probability p of finding an improving iterate. Schumer and Steiglitz [36] were among 
the first to propose such a scheme. In the bio-inspired optimization literature, the 
method is known as the (1+1) -Evolution Strategy (£S) [55]. Jagerskiipper [14 pro- 
vides a convergence proof of £S on convex quadratic functions. We here consider the 
following generic £S algorithm summarized in Algorithm [3] 



Algorithm 3 (l+l)-Evolution Strategy (£S) with adaptive step size control 
Input: N, x , (Tqi probability of improvement p = 0.27 

l — V 

l: Set c s = e3 ^ Cf — c s ■ e 1 -p . 
2: for k = to N do 

3: U k ~ Af(0, J„) 

4: if f(xk + crfcUfc) < f{xk) then 

5: Set x k+1 = x k + <T k u k and a k+1 = c s ■ a k . 

6: else 

7: Set x k+ i = x k and a k+1 = Cf ■ a k . 

8: end if 
9: end for 
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Depending on the specific random direction generator and the underlying test func- 
tion different optimality conditions can be formulated for the probability p. Schumer 
and Steiglitz 36J suggest the setting p = 0.27 which is also considered in this work. 
For all of the considered test functions the initial step size cfq has been determined 
experimentally in order to guarantee the targeted p at the start (see Table B.l for 
the respective values). The £S algorithm shares IZV^s invariance under strictly 
monotone transformations of the objective function. 

6.1.4. First-order Gradient Methods. In order to illustrate the numerical 
efficiency of the randomized zcroth-order schemes relative to that of first-order meth- 
ods, we also consider two Gradient Methods as outlined in (1.2). The first method 
(QM) uses a fixed step size Afc = [35]. The function-dependent constant L\ is, 
thus, part of the input to the QM algorithm. The second method (QMls) deter- 
mines the step size in each iteration using TZV M line search oracle LSapprdx^ with 
/i = 1e-5 as input parameter. 

6.2. Benchmark Functions. We now present the set of test functions used 
for the numerical performance evaluation of the different optimization schemes. We 
present the three function classes and detail the specific function instances and their 
properties. 

6.2.1. Quadratic Functions. We consider quadratic test functions of the form: 

1 



f{x) = -{x-lYQ{x-l), 



(6.1) 



where x € R™ and Q £ M. nxn is a diagonal matrix. For given L\ the diagonal entries 
are chosen in the interval [l,Lx]. The minimizer of this function class is x* = 1 and 
f(x*) = 0. The derivative is Vf(x) = Q(x — 1). We consider two different matrix 
instances. Setting Q = I n the n-dimensional identity matrix the function reduces to 
the shifted sphere function denoted here by f\. In order to get a quadratic function 
with anisotropic axis lengths we use a matrix Q whose first n/2 diagonal entries are 
equal to L\ and the remaining entries are set to 1. This ellipsoidal function is denoted 
by h- 

6.2.2. Nesterov's Smooth Functions. We consider Nesterov's smooth func- 
tion as introduced in Nesterov's text book [2S]- The generic version of this function 
reads: 



Li I 



Xi 



(6.2) 



This function has derivative V/a(x) = ^f- (Ax — ei), where 



A = 



2-100 
-12-10 
0-121 



-1 2 
-1 



and ex = (1,0, ...,0) 5 



The optimal solution is located at: 



RANDOM PURSUIT 



15 



For fixed dimension n, this function is strongly convex with parameter m ss jr^+rp ■ 
Thus, the condition Li/m grows quadratically with the dimension. Adding a reg- 
ularization term leads, however, to a strongly convex function with parameter m 
independent of the dimension. Given L\ > m > 0, the regularized function reads: 



U(x) 




r A + ^2 (xi+i - x. t ) 2 + x\ 



i=l 



m 2 
yNI 



(6.3) 



This function is strongly convex with parameter m. 

Its derivative V f±(x) = ( Ll ^ m A + ml) x - Ll ~ m e 1 , and the optimal solution x* 
satisfies (A 1 4m 



L i — m 



e 1 . 



6.2.3. Funnel Function. We finally consider the following funnel-shaped func- 



tion 



f 5 (x) = log + lO^z - l) T (z - 1)) 



(6.4) 



where x £ M. n . The minimizer of this function is x* = 1 with fs(x*) = 0. Its derivative 
for x ^ 1 is V/ 5 (x) = 10/(1 + 10a/(x- l) T (x - 1)) • sign (x — 1). A one-dimensional 
graph of fa is shown in the left panel of Figure |7.1| The function / 5 arises from 
a strictly monotone transformation of /i and thus belongs to the class of strictly 
quasiconvex functions. 

6.3. Numerical Optimization Results. To illustrate the performance of Ran- 
dom Pursuit in comparison with the other randomized methods we here present and 
discuss the key numerical results. For all numerical tests we follow the identical pro- 
tocol. All algorithms use as starting point xo = for all test functions. In order to 
compare the performance of the different algorithms across different test functions, we 
follow Nesterov's approach [21] and report relative solution accuracies with respect 
to the scale S w ^LiR 2 where R := ||xq — x*\\ is the Euclidean distance between 
starting point and optimal solution of the respective function. The properties of the 
four convex and continuously differentiable test functions and the quasiconvex fun- 
nel function along with the upper bounds on R 2 and the corresponding scales S are 
summarized in Table [O] 





Name 


function class 


L t 


m 


R 2 


S 


h 


Sphere 


strongly convex 


1 


1 


n 


h n 


h 


Ellipsoid 


strongly convex 


1000 


1 


n 


50n 


h 


Nesterov smooth 


convex 


1000 


~ 4(n+l) 2 


n+l 
3 


500- *±i 


h 


Nesterov strong 


strongly convex 


1000 


1 


n/1000 
4 


1000 


h 


Funnel 


not convex 






n 





Table 6.1: Test functions with parameters L\, m, R and the used scale S. 



Due to the inherent randomness of a single search run we perform 25 runs for 
each pair of problem instance/algorithm with different random number seeds. We 
compare the different methods based on two record values: (i) the minimal, mean, 
and maximum number of iterations (ITS) and (ii) the minimal, mean, and maximum 
number of function evaluations (FES) needed to reach a certain solution accuracy. 
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While the former records serve as a means to compare the number of oracle calls in the 
different method, the latter one only considers evaluations of the objective function 
as relevant performance cost. It is evident that measuring the performance of the 
algorithms in terms of oracle calls favors Random Pursuit because the line search 
oracle "does more work" than an oracle that, for instance, provides a directional 
derivative. For Random Gradient methods the number of FES is just twice the number 
of ITS when a first-order finite difference scheme is used for directional derivatives. 
For the £ S algorithm the number of ITS and FES is identical. For Random Pursuit 
methods the relation between ITS and FES depends on the specific test function, 
the line search parameter /i, and the actual implementation of the line search. Our 
theoretical investigation suggest that the randomized schemes are a factor of 0(n) 
times slower than the first-order QAi algorithms. This is due the reduced available 
(direction) information in the randomized methods compared to the n-dimensional 
gradient vector. For better comparison with QAi and G-Mls, we thus scale the 
number of ITS of the randomized schemes by a factor of 1/n. 

6.3.1. Performance on the Quadratic Test Functions for n < 1024. We 

first consider the two quadratic test functions in n = 2 2 , . . . , 2 10 dimensions. Table 6.2 
summarizes the minimum, maximum, and mean number of ITS (in blocks of size 
n) needed for each algorithm to reach the absolute accuracy 1.91 • 1Q~ G S on the 
sphere function f\. For the first-order QM. algorithms the absolute number of ITS 
is reported. Three key observations can be made from these data. First, all zeroth- 
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HQ 






TQ 
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QM 
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mean 


min 
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8 
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44 


30 
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13 


11 


28 


43 
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1 


16 


10 


14 


12 


33 


41 


37 


30 


37 


33 


10 


14 


12 


30 


42 


36 




1 


32 


11 


14 


12 


31 


36 


33 


28 


35 


31 


11 


16 


12 


33 


41 


37 




1 


64 


12 


14 


13 


30 


34 


32 


28 


33 


31 


12 


14 


13 


33 


41 


37 




1 


128 


12 


14 


13 


30 


32 


31 


29 


32 


31 


12 


14 


13 


35 


40 


37 




1 


256 


13 


14 


13 


30 


31 


30 


29 


31 


30 


13 


14 


13 


35 


40 


37 




1 


512 


13 


13 


13 


30 


31 


30 


30 


31 


30 


13 


14 


13 


36 


38 


37 




1 


1024 


13 


14 


13 


30 


31 


30 


30 


31 


30 


13 


13 


13 


36 


38 


37 




1 



Table 6.2: Recorded minimum, maximum, and mean #dTS/n on the sphere function 
fx to reach a relative accuracy of 1.91- 10 -6 . For QM. and QA4 ls the absolute number 
of ITS are recorded. 



order algorithms approach the theoretically expected linear scaling of the run time 
with dimension for strongly convex functions for sufficiently large n (for n > 64, e.g., 
the average number of ITS/n becomes constant for the VSP algorithms). Second, 
no significant performance difference can be found between 1ZV and its accelerated 
version across all dimensions. The performance of the algorithm pair TZQ/J-Q becomes 
similar for n > 128. Third, the Random Pursuit algorithms outperform all other 
zeroth-order methods in terms of number of ITS. Only the last observation changes 
when the number of FES is considered. Table 16.31 summarizes the number of FES 
(in blocks of size n) for all algorithms on fx- We see that the 1ZV M algorithms 
outperform the Random Gradient methods for low dimensions and perform equally 
well for n — 256. For n > 512 the Random Gradient schemes become increasingly 
superior to the Random Pursuit schemes. Remarkably, the adaptive step size £ S 
algorithm outperforms all other methods across all dimensions. The data also reveal 
that the line search oracle in the IZV^ algorithms consume on average four FES per 
iteration for n < 128 with a slight increase to seven FES per iteration for n — 1024. 
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37 
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50 


56 


53 


59 


64 


62 


58 


64 


61 


51 


56 


54 


35 


40 


37 


256 


56 


63 


59 


59 


62 


61 


58 


62 


60 


56 
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59 


35 


40 


37 


512 


64 


(,!) 


67 


59 


62 


60 


59 


62 


60 


64 


70 


67 


36 


38 


37 


1024 


84 


92 


89 


59 


61 


60 


60 


61 


60 


85 


91 


88 


36 


38 


37 



Table 6.3: Recorded minimum, maximum, and mean #FES/n on the sphere function 
fi to reach a relative accuracy of 1.91 ■ 10~ 6 . 



We also observe that the gap between minimum and maximum number of FES reduces 
with increasing dimension for all methods. Finally, the first-order schemes reach the 
minimum as expected in a single iteration across all dimension. 

For the high-conditioned ellipsoidal function fi we observe a genuinely different 
behavior of the different algorithms. Figure [6~T] shows for each algorithm the mean 
number of FES (left panel) and ITS (right panel) in blocks of size n needed to reach 
the absolute accuracy 1.91 • 10~ 6 S on / 2 . The minimum, maximum, and mean number 
of ITS and FES are reported in the Appendix in Tables |B.2| and |B.3| respectively. 



Random Pursuit 
Accelerated RP 
Random Gradient 
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dimension n 
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^-^fc-^S 7 - * - ^ - - o - 4 - -o 
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4 8 16 32 64 128 256 512 1024 
dimension n 



Fig. 6.1: Average #FES/n (left panel) and #ITS/n (right panel) vs. dimension n on 
the ellipsoidal function / 2 to reach a relative accuracy of 1.91 • 10 -6 (#ITS for QAi). 
Further data are available in Tables IB . 2| and |B .3 



We again observe the theoretically expected linear scaling of the number of ITS 
with dimension for sufficiently large n. The mean number of ITS now spans two orders 
of magnitude for the different algorithms. Standard Random Pursuit outperforms the 
VJQ and the £ S algorithm. Moreover, the accelerated 1ZT ' ^ scheme outperforms the 
J-Q scheme by a factor of 4. All methods show, however, an increased overall run 
time due to the high condition number of the quadratic form. This is also reflected 
in the increased number of FES that are needed by the line search oracle in the TZV M 
algorithms. The line search oracle now consumes on average 12-14 FES per iteration. 

In terms of consumed FES budget we observe that Random Pursuit still out- 
performs Random Gradient for small dimensions but needs a comparable number of 
FES for n > 64 (around 30.000 FES in blocks of n). The £S 7 the ATIV ^ and the 
TQ algorithm need an order of magnitude fewer FES. The accelerated VJP ^ is only 
outperformed by the J-Q algorithm. The performance of the £S algorithm is again 
remarkable given the fact that it does not need information about the parameters L\ 
and m which are of fundamental importance for the accelerated schemes. 
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6.3.2. Performance on the Full Benchmark Set for n = 64. We now il- 
lustrate the behavior of the different algorithms on the full benchmark set for fixed 
dimension n = 64. We observed similar qualitative behavior for all other dimensions. 
Table |6.4| contains the number of ITS needed to reach the scale-dependent accuracy 
1.91 • 10 _6 S I for all algorithms. We observe that Random Pursuit outperforms the TZQ 
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5954 
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2068 


2191 


2136 
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19004 


892 
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942 
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5766 
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4474 


2237 
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26 
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Table 6.4: Average #ITS/n to reach the relative accuracy 1.91 • 10 -6 in dimension 
n = 64. For QA4 and G-Mls the exact number of ITS is reported. Observed minimum 
ITS across all (gradient-free) algorithms are marked in bold face for each function. 



and the £ S algorithm, and that the ATZV ' ^ algorithm outperforms all gradient-free 
schemes in terms of number of ITS on all functions (with equal performance as Ran- 
dom Pursuit on /i and /s). We consistently observe an improved performance of all 
algorithms on the regularized strongly convex function as compared to its convex 
counterpart f$. This expected behavior is most pronounced for the AlZV^ scheme 
where, on average, the number of ITS is reduced to 159/473 « 1/3. 

A comparison between the two gradient schemes reveals that QM.ls outperforms 
the fixed step size gradient scheme on all test functions. The remarkable performance 
of QM.ls on f-2 is due to the fact that the spectrum of the Hessian contains in 
equal parts two different values (1 and L, respectively). A single line search along a 
gradient direction is thus simultaneously optimal for n/2 directions of this function. 
The QM.ls scheme thus reaches the target accuracy in as few as three steps. This 
efficiency is lost as soon as the spectrum becomes sufficiently diverse (as indicated 
by its performance on f^). We also remark that the performance of VSP /G-M-ls as 
well as the pair TQ /QM is in full agreement with theory. We see on functions fz/ fi 
that TZV is about a factor of n slower than the QM.ls due to unavailable gradient 
information. The same is true for TQjQM. where TQ is about An times slower than 
QA4 due to the theoretically needed reduction of the optimal step length by a factor 
of 1/4 [29]. 

For function we illustrate the convergence behavior of the different algorithms 
in Figure |6.2| After a short algorithm-dependent initial phase we observe linear 
convergence of all algorithms for fixed dimension, i.e., a constant reduction of the 
logarithm of the distance to the minimum per iteration. We also observe that the 
accelerated Random Pursuit consistently outperforms standard Random Pursuit for 
all measured accuracies on f± (see Table S-4 in 38 for the corresponding numeri- 
cal data). This behavior is less pronounced for the function pair /1//2 as shown in 
Figure |6.3| On fi both Random Pursuit schemes have identical progress rates that 
are also consistent with the theoretically predicted one. On f2 Random Pursuit out- 
performs the accelerated scheme for low accuracies (see also Table S-2 in [35] for the 
numerical data) but is quickly outperformed due to faster progress rate of the accel- 
erated scheme. We also observe that the theoretically predicted worst-case progress 
rate (dotted line in the right panel of Figure 6.3 1 does not reflect the true progress 
on this test function. Comparison of the numerical results on the function pair /1//5 
(see Figure 6.4) demonstrates the expected invariance under strictly monotone trans- 
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Fig. 6.2: Average accuracy (in log scale) vs. #ITS/n for all algorithms on f± in 
dimension n — 64. Further data are available in [35] in Table S-4. 




Fig. 6.3: Numerical convergence rate of standard and accelerated Random Pursuit 
on /i (left panel) and ji (right panel) in dimension n = 64. For both instances the 
theoretically predicted worst-case progress rate (dotted line) is shown for comparison. 
The rate of the ARSP scheme is compared to the theoretically predicted convergence 
rate of TQ (slash-dotted line) [35] . 



formations of the Random Pursuit algorithms and the ES scheme. These algorithms 
enjoy the same convergence behavior (up to small random variations) to the target 
solution while the Random Gradient schemes fail to converge to the target accuracy. 
Note, however, the numbers reported in e.g. Table 6.4 are not identical for fi and fs 
because the used stopping criteria are dependent on the scale of the function values. 
The convergence rates are the same, but more iterations are needed for / 5 because 
the required accuracy is considerably smaller. 

We also report the performance of the different algorithms in terms of number 
of FES needed to reach the target accuracy of 1.91 • lO -6 ^ for the different test 
functions. For all algorithms the minimum, maximum, and average number of FES 



are recorded in Table 6.5 We observe that the IZVu algorithm outperforms the 
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Fig. 6.4: Numerical convergence rate of the TZV^, the £S, and the 1ZG scheme on fx 
and in n — 64 dimensions. The accuracy is measured in terms of the logarithmic 
distance to the optimum log {\\xk — x*\\ 2 ). 
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5766 
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n 


11629 


12482 
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17708 
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1557 


2134 


1825 


2651 


2854 


2751 


h 


338 


384 


360 














342 


399 


361 


73 


85 


78 



Table 6.5: Average #FES/n to reach the relative accuracy 1.91 • 10 -6 in dimension 
n = 64. Observed minimum ^FES/n across all algorithms are marked in bold face 
for each function. 



standard Random Gradient method on all tested functions. However, Random Pursuit 
is not competitive compared to the accelerated schemes and the £S algorithm. The 
accelerated TZV ' M scheme is only outperformed by the TQ algorithm. The latter scheme 
shows particularly good performance on the convex function with considerably 
lower variance. For functions fa— fi the VSP ^ algorithms need around 12 — 15 FES per 
line search oracle call. We emphasize again that the performance of the adaptive step 
size ES scheme is remarkable given the fact that it does not need any function-specific 
parametrization. A comparison to the parameter-free Random Pursuit scheme shows 
that it needs around four times fewer FES on functions f2~fi- 

We remark that Random Pursuit with discrete sampling, i.e., using the set of 
signed unit vectors for sample generation (see Section 2.1), yields numerical results 
on the present benchmark that are consistent with our theoretical predictions. We 
observed improved performance of Random Pursuit with discrete sampling on the 
function triplet fi/fa/fs- This is evident as the coordinate system of these functions 
coincide with the standard basis. Thus, algorithms that move along the standard 
coordinate system are favored. On the function pair / we do not see any significant 
deviation from the presented performance results. 

We also exemplify the influence of the parameter fi on IZV^s performance. We 
choose the function as test instance because the TZP^ consumes most FES on this 
function. We vary fj, between 1e-1 and 1e-10 and run the VSP ^ scheme 25 times to 
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par am. fi: 


lE-1 


lE-2 


lE-3 


lE-4 


1E-5 


lE-6 


lE-7 


lE-8 


lE-9 


lE-10 


#ITS / n 
# FES / n 


1986 
19824 


1982 
19781 


2013 
21435 


2017 
26120 


2001 
29071 


2001 
29495 


2001 
29537 


2001 
29542 


2020 
38993 


2020 
39070 



Table 6.6: Performance of IZV^ on the ellipsoid function f%, m = 1, L = 1000, 
S = 3200, n — 64, for different line search parameters \i. Mean (of 25 repetitions) 
number of ITS/n and FES/rt to reach a relative accuracy of 1.91 • 10 -6 are reported. 



reach a relative accuracy of 1.91-10 6 . Mean number of ITS and FES are reported in 



Table 6.6 We see that the choice of \i has almost no influence on the number of ITS 
to reach the target accuracy, thus justifying the use of ITS as meaningful performance 
measure. The number of FES span the same order of magnitude ranging from 19824 
for 1e-1 to 39070 for Ie-10. We see that the number of FES for the standard setting 
H = 1e-5 is approximately in the middle of these extremes (29071 FES). This implies 
that the qualitative picture of the reported performance comparison is still valid but 
individual results for TVP^ and AlZV^ are improvable by optimally choosing fi. An in- 
depth analysis of the optimal function-dependent choice of the \i parameter is subject 
to future studies. 

As a final remark we highlight that the present numerical results for the Random 
Gradient methods are fully consistent with the ones presented in Nesterov's paper 

7. Discussion and Conclusion. We have derived a convergence proof and 
convergence rates for Random Pursuit on convex functions. We have used a quadratic 
upper bound technique to bound the expected single-step progress of the algorithm. 
Assuming exact line search, this results in global linear convergence for strongly convex 
functions and convergence of the order 1/k for general convex functions. 

For line search oracles with relative error [i the same results have been obtained 
with convergence rates reduced by a factor of jhjL- -^or hiexact line search with 
absolute error convergence can be established only up to an additive error term 
depending on //, the properties of the function and the dimensionality. 

The convergence rate of Random Pursuit exceeds the rate of the standard (first- 
order) Gradient Method by a factor of n. Jagerskiipper showed that no better perfor- 
mance can be expected for strongly convex functions [15] . He derived a lower bound 



for algorithms of the form ( 1.3 ) where at each iteration the step size along the random 
direction is chosen such as to minimize the distance to the minimum x* . On sphere 
functions f(x) = (x — x*) T (x — x*) Random Pursuit coincides with the described 
scheme, thus achieving the lower bound. 

The numerical experiments showed that (i) standard Random Pursuit is effective 
on strongly convex functions with moderate condition number, and (ii) the acceler- 
ated scheme is comparable to Nesterov's fast gradient method and outperforms the 
£S algorithm. The experimental results also revealed that (i) TZV^s empirical con- 
vergence is (as predicted by theory) n times slower than the one of the corresponding 
gradient scheme with line search (GM.ls)i an d (ii) both continuous and discrete sam- 
pling can be employed in Random Pursuit. We confirmed the invariance of the TZV^ 
algorithms and £S under monotone transformations of the objective functions on the 
quasiconvex funnel-shaped function f% where Random Gradient algorithms fail. We 
also highlighted the remarkable performance of the £S scheme given the fact that it 
does not need any function-specific input parameters. 

The present theoretical and experimental results hint at a number of potential 
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enhancements for standard Random Pursuit in future work. First, IZV^s convergence 
rate depends on the function-specific parameter Lj that bounds the curvature of the 
objective function. Any reduction of this dependency would imply faster convergence 
on a larger class of functions. The empirical results on the function pair /1//2 (see 
Tables S-l and S-2 in [38]) also suggest that complicated accelerated schemes do 
not present any significant advantage on functions with small constant L±. It is 
conceivable that Random Pursuit can incorporate a mechanism to learn second-order 
information about the function "on the fly", thus improving the conditioning of the 
original optimization problem and potentially reducing it to the L\ ss 1 case. This may 
be possible using techniques from randomized Quasi-Newton approaches |2 1231 SZ] or 
differential geometry [7] . It is noteworthy that heuristic versions of such an adaptation 
mechanism have proved extremely useful in practice for adaptive step size algorithms 

Ha USES] 

Second, we have not considered Random Pursuit for constrained optimization 
problems of the form: 

min/(ar) subject to igK, (7-1) 

where JC C W 1 is a convex set. The key challenge is how to treat iterates Xk+i = 
Xk + LSAPPROx(iEfc, u) ■ u generated by the line search oracle that are outside the 
domain JC. A classic idea is to apply a projection operator ttjq and use the resulting 
x' k+1 := nicixk+i) as the next iterate. However, finding a projection onto a convex 
set (except for simple bodies such as hyper-parallelepipeds) can be as difficult as 
the original optimization problem. Moreover, it is an open question whether general 
convergence can be ensured, and what convergence rates can be achieved. Another 
possibility is to constrain the line search to the intersection of the line and the convex 
body JC. In this case, it is evident that one can only expect exponentially slow 
convergence rates for this method. Consider the linear function f(x) = l T x and 
JC = M3_. Once an iterate Xk lies at the boundary dJC of the domain, say the first 
coordinate of Xk is zero, then only directions u with positive first coordinate may lead 
to an improvement. As soon as a constant fraction of the coordinates are zero, the 
probability of finding an improving direction is exponentially small. Karmanov |17) 
proposed the following combination of projection and line search constraining: First, 
a random point y at some fixed distance of the current iterate is drawn uniformly at 
random and then projected to the set JC. A constrained line search is now performed 
along the line through the current iterate Xk and n/c (y) . It remains open to study the 
convergence rate of this method. 

Finally, we envision convergence guarantees and provable convergence rates for 
Random Pursuit on more general function classes. The invariance of the line search 
oracle under strictly monotone transformations of the objective function already im- 
plied that Random Pursuit converges on certain strictly quasiconvex functions. It 
also seems in reach to derive convergence guarantees for Random Pursuit on the class 
of globally convex (or (5-convex) functions |12j or on convex functions with bounded 
perturbations [3D] (see right panel of Figure \TJ\ for the graph of such an instance). 
This may be achieved by appropriately adapting line search methods to these func- 
tion classes. In summary, we believe that the theoretical and experimental results 
on Random Pursuit represent a promising first step toward the design of competitive 
derivative-free optimization methods that are easy to implement, possess theoretical 
convergence guarantees, and are useful in practice. 
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Fig. 7.1: Left panel: Graph of function /g in ID. Right Panel: Graph of a globally 
convex function /go 
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Appendix A. Lemma. 

Lemma A.l. Let {ft}teti be a sequence with /j 6 K + . Suppose 



for some constants > 1, C > and D > 0. Then it follows by induction that 



where Q(9) = max {8 2 C/(6 - 1), /i } . 

A very similar result was stated without proof in 27] and also Hazan [10] is using 
the same. 

Proof. For t = 1 it holds that f\ < Q(9) by definition of Q{&). Assume that the 
result holds for t > 1. If Q{6) = 2 C/(d - 1) then we deduce: 



ft+i < (i-e/t)f t + ce 2 /t 2 + D, 



fort>\, 



ft<Q(9)/t+(t-l)D, 




C(t-1) 



+ 



D(t 2 
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If on the other hand Q{9) = fi, then 



-1 



(0- > 2 C, 



and it follows 

(t-0)fi G6 2 (t-6)(t-l)D 



ft+i < 



t 2 
i 2 



t 



D 



C-{6-l)h , D(t 2 -0(t-l)) < /i 



f 2 



t+1 



□ 



Appendix B. Tables. 

B.l. Initial ctq of the £S algorithm for all test functions. Table IbTT] re- 



ports the empirically determined optimal initial step sizes Co used as input to the ES 
algorithm. 



dim 


h 


h 


h 


h 


4 


0.79158 


1.3897 


0.2054 


0.20395 


8 


0.49167 


0.78761 


0.08922 


0.088145 


16 


0.32692 


0.49500 


0.04134 


0.041273 


32 


0.22292 


0.32547 


0.019911 


0.019905 


64 


0.15542 


0.22243 


0.0097212 


0.0097127 


128 


0.10925 


0.15638 


0.0048305 


0.0048335 


256 


0.076658 


0.10902 


0.0024171 


0.0024114 


512 


0.054339 


0.076568 


0.0012012 


0.0012006 


1024 


0.038367 


0.054173 


0.00060284 


0.00060223 



Table B.l: The initial values of the stepsize a for (1 + 1)-ES on the test functions for 
various dimensions. 



B.2. Data for the ellipsoid test function for n < 1024. Tables [531 and [53 
report the numerical data used to produce Figure |6.1| 
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Table B.2: Ellipsoid function / 2 to accuracy 1.91 -10 6 , S 
{QM and QMls- #ITS). 
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Table B.3: Ellipsoid function / 2 to accuracy 1.91 TO" 6 , S = 50n, L = 1000. #FES/n. 
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Appendix C. Exemplary Matlab Codes. 
C.l. Accelerated Random Pursuit. 



function [fval, x, funeval] = rp_acc ( f itfun, xstart, N, mu, m 


, L) 


% Algorithm Accelerated Random Pursuit 




% fitfun name or function handle 




% N number of iterations 




% mu line search accuracy 




% funeval #function evaluations 




% m, L parameters for quadratic upper and lower bounds 




%line search parameters 




opts = optimset ('Display 1 , 'off', ' Large Scale ', 'off', ' TolX ' 


t m u) ; 


funeval = 0; 




x = xstart; 




n = length (xstart) ; 




th = 1/ (L*n"2) ; 




ga = m; 




v = x; 




for i = 1:N 




p = [1/th, ga — m, — ga] ; 




be = max (roots (p) ) ; 




de = (be * ga) / (ga + be * m) ; 




y = (1 — de) * x + de * v; 




ga = (1 — be) * ga + be * m; 




la = be / ga * m; 




d=randn (size (xstart) ) ; d=d/ norm (d) ; 




[sigma, fval, ~, infos] = ... 




fminunc ( @ ( sigma) f eval ( fitfun, y + sigma*d) , 0, 


opts ) ; 


funeval = funeval + infos . funcCount ; 




x = y + sigma * d; 




v = (1 — la) * v + la * y + sigma/ (be*n) * d; 




end 





Fig. S-l: Matlab code for algorithm ATZV^ 
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C.2. Random Pursuit. 



function [fval, x, funeval] = rp (fit fun, xstart, 


N, mu) 




% Algorithm Random Pursuit 






% fitfun name or function handle 






% N number of iterations 






% mu line search accuracy 






% funeval ^function evaluations 






%line search parameters 






opts = optimset (' Display ' , 'off', ' LargeScale ' , 


' off ' , ' TolX 


, mu) ; 


funeval = 0; 






x = xstart; 






for i = 1:N 






d=randn (size (xstart) ) ; d=d/ norm (d) ; 






[sigma, fval, ~, infos] = ... 






fminunc (@ (sigma) f eval ( fitfun, x + 


sigma*d) , , 


opts ) ; 


funeval = funeval + infos . funcCount ; 






x = x 4- sigma*d; 






end 







Fig. S-2: Matlab code for algorithm 7ZV M 



C.3. Random Gradient. 



function [fval, x] = rg (fitfun, xstart, N, eps, L) 

% Random Gradient [Nesterov 2011] 

% fitfun name or function handle 

% N number of iterations 

% eps finite difference parameter 

% L quadratic upper bound 

x = xstart; 

n = length (xstart ) ; 

h = 1/ (4 * L * (n+4) ) ; 

for i - 1:N 

d = randn (size (xstart) ) ; 

g = ( f eval ( fit fun, x + eps*d) — feval (fitfun, x) ) / eps; 
x = x — h * g * d; 

end 

fval = feval (fitfun, x) ; 



Fig. S-3: Matlab code for algorithm 1ZQ 
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C.4. Accelerated Random Gradient. 



function [fval, x] = rg_acc (f itfun, xstart, N, eps, m, L) 

% Accelerated Random Gradient [Nesterov 2011] 

% fitfun name or function handle 

% N number of iterations 

% eps finite difference parameter 

% m, L parameters for quadratic upper and lower bounds 



x = xstart; 

n = length (xstart ) ; 

h = 1/ (4 * L * (n+4) ) ; 

th = 1/(16 * L * (n + 1) "2) ; 

ga = m; 

v = x; 



for i = 1:N 

p = [1/th, ga — m, — ga] ; 

be = max (roots (p) ) ; 

de = (be * ga) / (ga + be * m) , 

y = (1 — de) * x + de * v; 

ga = (1 — be) * ga + be * m; 

la = be / ga * m; 



d=randn (size (xstart) ) ; 

g = ( f eval ( f it f un, x + eps*d) — feval (fitfun, x) ) / eps; 
x = y — h * g * d; 

v = (1 — la) * v + la * y — th / be * g * d; 

end 

fval = f eval ( fitfun, x) ; 



Fig. S-4: Matlab code for algorithm TQ 
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C.5. (1+1)-ES. 



function [fval, x] = es (f itfun, xstart, N, 

% (1+1)— ES 


sigma) 


% fitfun name or function handle 
% N number of iterations 
% sigma initial stepsize 
% funeval #function evaluations 




fval — feval (fitfun, xstart) } 
x = xstart; 
ss=exp (1/3) ; 

ff=exp(l/3 * (-0.27) / (1-0.27) ) ; 




for i = 1:N 

d = randn (size (xstart) ) ; 

f = f eval ( fitfun, x + sigma*d) ; 




if f < fval 




x = x + sigma * d; fval = f; sigma 

else 


= sigma * ss; 


sigma = sigma * ff; 

end 

end 





Fig. S-5: Matlab code for algorithm ES 



Appendix D. Tables. 

D.l. Number of iterations for increasing accuracy for n = 64. Tables [S^T] 
- |S-5| summarize the number of iterations needed to achieve a corresponding relative 
accuracy (acc) for fixed dimension n — 64. 



acc. 




RP 


mean 


TIQ 


min 


rg 






AKP 

max 


mean 




es 

max 


nean 


QM 


GM LS 


6.25 


10-* 


2 


3 


3 


6 


8 


7 


6 


8 


7 


2 


3 


3 


6 


9 


8 


1 




3.12 


io- 2 


3 


4 


3 


8 


10 


8 


8 


10 


8 


3 


4 


3 


8 


12 


10 


1 




1.56 


10~ 2 


4 


5 


4 


9 


12 


10 


9 


12 


10 


4 


5 


4 


10 


14 


12 


1 




7.81 


IO" 3 


4 


6 


5 


11 


13 


12 


11 


13 


12 


4 


5 


5 


12 


16 


14 


1 




3.91 


10" 3 


5 


6 


5 


13 


15 


14 


13 


15 


14 


5 


6 


5 


14 


18 


16 


1 




1.95 


IO" 3 


5 


7 


6 


14 


17 


15 


14 


17 


15 


5 


7 


6 


15 


20 


18 


1 




9.77 


io- 4 


6 


8 




16 


18 


17 


16 


18 


17 


6 


8 


7 


17 


22 


20 


1 




4.88 


IO" 4 


7 


9 


7 


18 


20 


19 


18 


20 


19 


7 


9 


7 


18 


24 


21 


1 




2.44 


io- 1 


7 


9 


8 


19 


22 


20 


19 


22 


20 


7 


9 


8 


20 


26 


23 


1 




1.22 


IO" 4 


8 


10 


9 


21 


24 


22 


21 


24 


22 


8 


10 


9 


22 


28 


25 


1 




6.10 


10~ 5 


9 


10 


10 


22 


26 


24 


22 


26 


24 


9 


11 


9 


23 


31 


27 


1 




3.05 


10~ 5 


9 


11 


10 


24 


28 


25 


24 


28 


25 


10 


11 


10 


26 


32 


29 


1 




1.53 


10~ 5 


10 


12 


11 


25 


29 


27 


25 


29 


27 


10 


12 


11 


28 


35 


31 


1 




7.63 


io- 6 


11 


13 


12 


27 


31 


29 


27 


31 


29 


11 


13 


12 


29 


36 


33 


1 




3.81 


io- 6 


11 


13 


12 


28 


32 


30 


28 


32 


30 


12 


13 


12 


31 


38 


35 


1 




1.91 


10" 8 


12 


14 


13 


30 


34 


32 


30 


34 


32 


12 


14 


13 


33 


41 


37 


1 





Table S-l: Sphere function fx, m = 1, L = 1, S = 32, n = 64. #ITS/n, {QM, QMls- 
#ITS). 
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ace. 


KT 

min max mean 


min 


Kg 

max 


mean 


min 


TQ 

max 


mean 


min 


AKV 

max 


mean 


£5 

min max mean 


QM 


QMls 


6.25 


10" J 


2 


3 


2 


9 


11 


10 


5 


18 


16 


64 


79 


71 


5 


9 


6 


1 


1 


3.12 


io- 2 


2 


3 


3 


11 


13 


12 


17 


21 


18 


70 


86 


77 


6 


10 


8 


1 


1 


1.56 


io- 2 


3 


4 


3 


13 


15 


14 


31 


359 


261 


77 


93 


84 


7 


19 


10 


1 


1 


7.81 


10~ 3 


4 


132 


53 


15 


20 


17 


340 


423 


380 


84 


101 


91 


18 


465 


268 


1 


1 


3.91 


io- 3 


104 


298 


210 


381 


1103 


677 


404 


484 


444 


92 


125 


107 


481 


928 


723 


124 


2 


1.95 


io- 3 


273 


460 


373 


1863 


2578 


2150 


464 


544 


504 


101 


140 


129 


936 


1410 


1177 


470 


2 


9.77 


IO" 4 


433 


624 


536 


3340 


4062 


3624 


521 


601 


562 


129 


153 


143 


1376 


1869 


1631 


817 


2 


4.88 


io- 1 


598 


787 


700 


4815 


5536 


5094 


576 


657 


618 


142 


164 


153 


1826 


2325 


2085 


1163 


2 


2.44 


IO -4 


761 


954 


862 


6280 


7018 


6564 


630 


712 


673 


153 


174 


162 


2264 


2774 


2538 


1509 


2 


1.22 


IO" 1 


921 


1118 


1024 


7754 


8488 


8036 


683 


767 


727 


161 


183 


170 


2732 


3239 


2994 


1856 


2 


6.10 


10~ 5 


1080 


1284 


1187 


9232 


9961 


9508 


736 


820 


781 


168 


191 


178 


3172 


3690 


3447 


2202 


2 


3.05 


IO -5 


1243 


1446 


1350 


10712 


11439 


10980 


787 


873 


833 


176 


199 


187 


3635 


4138 


3905 


2549 


3 


1.53 


IO" 5 


1406 


1607 


1512 


12179 


12906 


12453 


839 


925 


885 


189 


214 


201 


4083 


4593 


4361 


2895 


3 


7.63 


10~ 6 


1570 


1766 


1675 


13654 


14388 


13923 


890 


977 


936 


203 


227 


217 


4537 


5042 


4819 


3241 


3 


3.81 


10~ 6 


1732 


1928 


1837 


15130 


15854 


15395 


940 


1029 


988 


219 


238 


230 


4989 


5492 


5273 


3588 


3 


1.91 


io- 6 


1899 


2096 


2001 


16601 


17333 


16868 


990 


1079 


1038 


233 


250 


242 


5451 


5954 


5729 


3934 


3 



Table S-2: Ellipsoid function / 2 , m = 1, L = 1000, S = 3200, n = 64. #ITS/n, (QM, 
QM LS : #ITS). 









KT 






ng 






to 






ART 






ES 




gM 


gM LS 




aec. 


min 


max 


mean 


min 


max 


mean 


min 


max 


mean 


min 


max 


mean 


min 


max 


mean 






6.25 


io~ 2 















































i 


1 


3.12 


10~ 2 















































i 


1 


1.56 


io- 2 















































i 


1 


7.81 


io- 3 


1 


1 


1 


3 


5 


4 


2 


4 


3 





1 


1 


2 


4 


3 


i 


1 


3.91 


io- 3 


3 


4 


3 


18 


23 


21 


8 


13 


10 


3 


19 


8 


6 


10 


9 




3 


1.95 


io- 3 


8 


12 


10 


74 


84 


79 


29 


40 


34 


12 


66 


25 


22 


31 


26 


19 


10 


9.77 


IO" 1 


26 


37 


31 


256 


283 


269 


55 


84 


70 


22 


75 


41 


73 


102 


86 


64 


32 


4.88 


10" 1 


79 


109 


92 


790 


837 


811 


104 


152 


130 


33 


132 


61 


233 


292 


257 


191 


96 


2.44 


10" 1 


200 


264 


228 


1993 


2065 


2022 


164 


258 


201 


35 


173 


87 


577 


700 


633 


477 


239 


1.22 


IO" 1 


405 


501 


453 


3955 


4053 


4004 


224 


328 


279 


58 


199 


127 


1142 


1344 


1249 


945 


473 


6.10 


10"' 


665 


778 


723 


6349 


6465 


6412 


293 


382 


348 


74 


255 


151 


1867 


2101 


1998 


1512 


756 


3.05 


10" 5 


948 


1060 


1005 


8849 


8964 


8917 


369 


427 


402 


88 


391 


213 


2641 


2891 


2780 


2101 


1051 


1.53 


10" 5 


1229 


1345 


1288 


11375 


11482 


11435 


397 


613 


442 


96 


449 


265 


3406 


3692 


3563 


2694 


1347 


7.63 


10"" 


1509 


1619 


1570 


13886 


14016 


13958 


450 


894 


677 


129 


533 


342 


4223 


4461 


4348 


3288 


1644 


3.81 


io-" 


1792 


1902 


1853 


16401 


16545 


16482 


463 


935 


887 


188 


632 


400 


5008 


5254 


5133 


3881 


1940 


1.91 


10"" 


2068 


2191 


2136 


18922 


19075 


19004 


892 


970 


942 


192 


678 


473 


5766 


6050 


5916 


4474 


2237 



Table S-3: Nesterov smooth / 3 , L = 1000, S = 10833, n = 64. #ITS/n, (QM, 
QMls'- #ITS). 



acc. 


min 


kt 

max 


mean 


min 


Kg 

max 


mean 


min 


to 


mean 


min 


AKV 
max 


mean 




£S 
max 


mean 


gM 


QM LS 


6.25 


io- 2 


1 


2 


1 


7 


9 


8 


4 


5 


5 


1 


6 


2 


3 


5 


4 


2 


1 


3.12 


io- 2 


3 


5 


4 


25 


31 


28 


10 


16 


13 


5 


19 


11 


8 


13 


11 




4 


1.56 


io- 2 


8 


12 


10 


79 


90 


82 


27 


37 


33 


9 


45 


22 


19 


35 


27 


19 


10 


7.81 


io- 3 


19 


31 


24 


199 


219 


204 


46 


70 


59 


21 


53 


32 


52 


76 


65 


48 


24 


3.91 


io- 3 


43 


62 


50 


415 


458 


432 


74 


107 


88 


25 


58 


41 


109 


152 


135 


102 


51 


1.95 


io- 3 


82 


106 


90 


772 


824 


791 


104 


140 


118 


39 


88 


53 


214 


272 


247 


187 


94 


9.77 


10" 1 


131 


166 


146 


1248 


1341 


1284 


138 


177 


157 


49 


96 


65 


352 


447 


399 


303 


152 


4.88 


10" 1 


195 


236 


214 


1841 


1965 


1900 


169 


216 


191 


52 


109 


73 


533 


638 


587 


449 


225 


2.44 


10" 1 


267 


317 


295 


2540 


2694 


2621 


195 


259 


228 


64 


114 


82 


740 


875 


810 


619 


310 


1.22 


10" 1 


352 


409 


384 


3326 


3513 


3420 


244 


293 


266 


68 


127 


95 


976 


1132 


1059 


807 


404 


6.10 


10~ 5 


441 


506 


479 


4174 


4387 


4277 


282 


335 


306 


89 


132 


107 


1233 


1403 


1325 


1009 


505 


3.05 


io- 5 


539 


605 


579 


5057 


5288 


5168 


321 


376 


342 


96 


160 


119 


1508 


1681 


1603 


1219 


610 


1.53 


10~ 5 


642 


715 


682 


5969 


6206 


6079 


351 


402 


376 


108 


168 


130 


1788 


1974 


1885 


1434 


717 


7.63 


io-« 


740 


816 


786 


6893 


7138 


7001 


383 


430 


409 


117 


177 


138 


2071 


2269 


2173 


1650 


826 


3.81 


io-« 


845 


920 


890 


7816 


8064 


7926 


423 


453 


435 


127 


184 


148 


2357 


2568 


2462 


1868 


935 


1.91 


IO" 6 


954 


1023 


995 


8727 


8995 


8854 


441 


534 


458 


137 


188 


159 


2651 


2854 


2751 


2086 


1044 



Table S-4: Nesterov strongly convex function fa, m = 1, L = 1000, S = 1000, n = 64. 
#ITS/n, {QM, QMls- #ITS), 
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KT 




KS 


rg 




ART 






es 




GM 


GMls 




acc. 


min 


max 




min max mean 


min max mean 


min 


max I 




min 


max 


nean 






6.25 


io- 2 


4 


6 


5 






4 


6 


5 


13 


17 


14 




1 


3.12 


io- 2 


7 


9 


7 






7 


8 




19 


25 


22 




1 


1.56 


io- 2 


8 


11 


9 






8 


10 


9 


24 


32 


27 






7.81 


IO" 3 


10 


13 


11 






10 


12 


11 


29 


37 


32 






3.91 


IO" 3 


11 


15 


12 






12 


14 


12 


32 


42 


36 






1.95 


io- 3 


13 


16 


14 






13 


15 


14 


35 


46 


40 






9.77 


IO"' 1 


14 


17 


15 






14 


17 


15 


39 


50 


43 






4.88 


io-" 


16 


19 


17 






15 


18 


17 


43 


54 


47 






2.44 


IO" 4 


17 


20 


18 






17 


20 


18 


47 


58 


51 






1.22 


io- 1 


18 


22 


19 






18 


22 


19 


51 


62 


55 






6.10 


IO" 5 


19 


23 


21 






19 


23 


21 


55 


66 


59 






3.05 


IO" 5 


21 


24 


22 






21 


25 


22 


59 


70 


63 






1.53 


IO" 5 


22 


25 


23 






22 


26 


23 


62 


74 


67 






7.63 


IO" 8 


23 


27 


25 






23 


28 


25 


67 


77 


70 






3.81 


IO" 8 


25 


28 


26 






24 


29 


26 


71 


81 


74 






1.91 


io- 8 


26 


30 


28 






26 


30 


28 


73 


85 


78 







Table S-5: Funnel function / 5 , S = 32, n = 64. #ITS/n, (GM, GMls- #H?S). 



D.2. Number of function evaluations for increasing accuracy for fixed 
dimension n = 64. Tables [5~-6l - IS-lOl summarize the number of function evaluations 
needed to achieve a corresponding relative accuracy (acc) for fixed dimension n = 64. 
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ART 
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min 


max 


mean 


min 


max 


mean 


min 


max 


mean 


min 


max 


mean 


min 


max 


mean 


6.25 


HP 2 


10 


14 


12 


11 


15 


14 


12 


15 


13 


9 


13 


11 


6 


!) 


8 


3.12 


io- 2 


12 


17 


14 


15 


19 


17 


15 


18 


17 


12 


16 


14 


8 


12 


10 


1.56 


io- 2 


15 


21) 


17 


18 


23 


20 


18 


21 


20 


15 


19 


17 


10 


14 


12 


7.81 


ur 3 


17 


23 


20 


22 


27 


24 


21 


25 


23 


17 


22 


20 


12 


16 


14 


3.91 


10 -3 


20 


27 


22 


25 


30 


27 


24 


28 


26 


20 


25 


22 


14 


18 


16 


1.95 


io-=> 


22 


20 


25 


28 


34 


30 


27 


32 


29 


22 


21) 


25 


15 


20 


18 


9.77 


io- 4 


25 


33 


28 


32 


37 


34 


29 


35 


33 


24 


32 


27 


17 


22 


20 


4.88 


i<r 4 


27 


35 


31 


35 


40 


37 


32 


38 


36 


27 


35 


30 


18 


24 


21 


2.44 


io- 4 


30 


38 


33 


38 


44 


41 


35 


42 


39 


29 


38 


33 


20 


26 


23 


1.22 


io- 4 


32 


40 


36 


41 


47 


44 


38 


45 


42 


32 


40 


36 


22 


28 


25 


C.10 


io- 5 


35 


42 


39 


44 


52 


47 


41 


48 


45 


36 


43 


38 


23 


31 


27 


3.05 


io- 5 


38 


44 


42 


47 


56 


51 


44 


51 


49 


39 


46 


41 


26 


32 


29 


1.53 


10~ 5 


41 


48 


44 


51 


58 


54 


47 


55 


52 


41 


49 


44 


28 


35 


31 


7.63 


io- 6 


43 


52 


47 


53 


61 


57 


51 


58 


55 


44 


51 


47 


29 


36 


33 


3.81 


ur" 


45 


54 


50 


57 


65 


60 


54 


63 


58 


47 


53 


49 


31 


38 


35 


1.91 


io-" 


47 


57 


52 


CO 


68 


64 




66 


61 


50 


56 


52 


33 


41 


37 



Table S-6: Sphere function f x , m = l,L = l,S= 32, n = 64. #FES/n. 











KT 






Kg 






rg 






AKT 






£S 






acc. 


mill 


max 


mean 


min 


max 


mean 


mill 


max 


mean 


mill 


max 


mean 


min 


max 


mean 


6.25 


10" 


-2 


12 


21 


16 


18 


22 


20 


10 


36 


33 


637 


809 


733 




9 


6 


3.12 


10" 


-2 


15 


25 


19 


21 


25 


23 


34 


41 


37 


705 


883 


802 


6 


10 


8 


1.56 


io- 


-2 


20 


32 


25 


25 


30 


28 


62 


717 


522 


777 


966 


872 


7 


19 


10 


7.81 


io- 


-3 


29 


1671 


662 


30 


39 


34 


680 


847 


760 


863 


1092 


963 


18 


465 


268 


3.91 


io- 


3 


1416 


3925 


2803 


761 


2200 


1354 


808 


968 


887 


978 


1418 


1185 


481 


928 


723 


1.95 


io- 


3 


3809 


6240 


5125 


3725 


5155 


4300 


928 


1087 


1008 


1054 


1613 


1489 


936 


1410 


1177 


9.77 


io- 


4 


6096 


8577 


7458 


6681 


8124 


7248 


1042 


12113 


1124 


1526 


1791) 


1688 


1376 


1869 


1631 


4.88 


io- 


-4 


8495 


10961 


9848 


9629 


11072 


10189 


1152 


1315 


1236 


1702 


1955 


1835 


1826 


2325 


2085 


2.44 


io- 


4 


10936 


13455 


12278 


12560 


14036 


13129 


1259 


1425 


1317 


1844 


2108 


1966 


2264 


2774 


2538 


1.22 


io- 


4 


13340 


15896 


14705 


15507 


16977 


16072 


1365 


1533 


1455 


1960 


2242 


2088 


2732 


3239 


2994 


6.10 


io- 


5 


15726 


18390 


17144 


18463 


19921 


19015 


1471 


1641 


1562 


2068 


2362 


2211 


3172 


3690 


3447 


3.05 


io- 


5 


18147 


20800 


19566 


21423 


22877 


21960 


1575 


1747 


1666 


2188 


2508 


2353 


3635 


4138 


3905 


1.53 


io- 


5 


20573 


23183 


21970 


24357 


25812 


24906 


1678 


1851 


1770 


2389 


2743 


2552 


4083 


4593 


4361 


7.63 


io- 


-6 


22981 


25521 


24375 


27308 


28776 


27846 


1780 


1954 


1873 


2627 


2918 


2789 


4537 


5042 


4819 


3.81 


10" 


-6 


25337 


27890 


26733 


30259 


31707 


30790 


1880 


2057 


1975 


2857 


3083 


2993 


4989 


5492 


5273 


1.91 


10 




27723 


30272 


29071 


33202 


34667 


33736 


1980 


2159 


2077 


3035 


3247 


3159 


5451 


5954 


5729 



Table S-7: Ellipsoid function f 2 ,m = l,L= 1000, S = 3200, n = 64. #FES/n. 
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acc. 


— 5^5- 


TIV 

— °T 


mCa ° 




Tig 






TQ 

ml ™ 


m<!a ° 




A'R'P 


m0 "'' 




£S 




6.25 


10" 


2 








o 


Q 


Q 








(1 








(1 




3.12 


io- 


2 








Q 


1) 










Q 






g 


Q 




1.56 


io- 


2 











o 





o 














o 





o 


o 





7.81 


io- 


3 




8 


7 


7 


10 


8 




7 


6 


4 


12 


7 


2 


4 


3 


3.91 


io- 


3 


22 


33 


27 


36 


45 


42 


17 


25 


21 


29 


163 


66 


6 


10 


9 


1.95 


io- 


3 


73 


123 


96 


147 


169 


158 


58 


80 


67 


102 


604 


221 


22 


31 


26 


9.77 


io- 


4 


282 


410 


344 


511 


566 


538 


110 


167 


140 


198 


690 


375 


73 


102 


86 


4.88 


io- 


4 


930 


1304 


1094 


1579 


1675 


1621 


208 


305 


261 


301 


1264 


584 


233 


292 


257 


2.44 


io- 


4 


2440 


3232 


2788 


3987 


4130 


4045 


327 


517 


401 


328 


1716 


866 


577 


700 


633 


1.22 


io- 


4 


4996 


6186 


5588 


7909 


8106 


8009 


448 


656 


557 


574 


2132 


1332 


1142 


1344 


1249 


6.10 


io- 




8225 


9610 


8942 


12697 


12930 


12824 


586 


765 


696 


763 


2804 


1621 


1867 


2101 


1998 


3.05 


io- 




11739 


13117 


12445 


17699 


17928 


17833 


739 


855 


803 


903 


4485 


2372 


2641 


2891 


2780 


1.53 


io- 


5 


15220 


16645 


15948 


22750 


22963 


22870 


795 


1225 


885 


1004 


5201 


3010 


3406 


3692 


3563 


7.63 


io- 




18678 


20032 


19423 


27772 


28033 


27917 


900 


1788 


1355 


1410 


11242 


3979 


4223 


4461 


4348 


3.81 


io- 




22154 


23511 


22902 


32801 


33091 


32963 


926 


1870 


1775 


2145 


7381 


4691 


5008 


5254 


5133 


1.91 


io- 




25520 


27034 


26351 


37844 


38150 


38008 


1785 


1939 


1885 


2199 


8149 


5609 


5766 


6050 


5916 



Table S-8: Nesterov smooth f 3 , L = 1000, S = 10833, n = 64. #FES/n. 



acc. 


mill 


TIV 
max 


mean 


min 


Tig 

max 


mean 


min 


TQ 
max 


mean 


min 


ATIV 
max 


mean 


min 


ES 
max 


mean 


6.25 


10 


-2 


8 


15 


12 


13 


18 


15 


8 


11 


9 


8 


52 


15 


3 




4 


3.12 


10 


-2 


29 


45 


35 


50 


61 


56 


21 


32 


26 


43 


173 


93 


8 


13 


11 


1.56 


10 


-2 


74 


120 


97 


157 


179 


164 


54 


75 


66 


83 


421 


193 


19 


35 


27 


7.81 


10 


-3 


198 


342 


255 


397 


438 


408 


93 


139 


119 


184 


497 


290 


52 


76 


65 


3.91 


10 


-3 


490 


718 


570 


831 


915 


864 


148 


214 


175 


223 


555 


391 


109 


152 


135 


1.95 


10 


-3 


969 


1258 


1071 


1543 


1648 


1583 


208 


281 


237 


373 


K90 


516 


214 


272 


247 


9.77 


10 


-1 


1578 


2012 


1757 


2495 


2682 


2568 


276 


354 


314 


477 


989 


662 


352 


447 


399 


4.88 


10 


-4 


2374 


2877 


2605 


3681 


3929 


3800 


338 


433 


383 


530 


1144 


763 


533 


638 


587 


2.44 


10 


-4 


3260 


3883 


3607 


5079 


5387 


5242 


389 


519 


457 


675 


1209 


876 


740 


875 


810 


1.22 


10 


-4 


4315 


5011 


4710 


6653 


7026 


6840 


487 


587 


532 


727 


1370 


1037 


976 


1132 


1059 


6.10 


10 


-5 


5417 


6220 


5882 


8347 


8774 


8555 


563 


671 


612 


978 


1437 


1179 


1233 


1403 


1325 


3.05 


10 


-5 


6621 


7435 


7116 


10113 


10577 


10337 


642 


752 


685 


1074 


1782 


1323 


1508 


1681 


1603 


1.53 


10 


-5 


7872 


8772 


8371 


11937 


12411 


12159 


702 


804 


753 


1212 


1882 


1468 


1788 


1974 


1885 


7.63 


10 


-6 


9069 


10005 


9628 


13786 


14275 


14002 


766 


861 


818 


1286 


1986 


1559 


2071 


2209 


2173 


3.81 


10 


-6 


10331 


11264 


10885 


15633 


16129 


15852 


845 


906 


870 


1448 


2078 


1687 


2357 


2568 


2462 


1.91 


10 


-6 


11629 


12482 


12122 


17455 


17990 


17708 


883 


1069 


916 


1557 


2134 


1825 


2651 


2854 


2751 



Table S-9: Nesterov strongly convex function m = 1, L = 1000, S = 1000 n = 64. 
#FES/n. 









TIT 




Tig 


TQ 




ATIV 






es 






acc. 


min 


max 


mean 


min max mean 


Tnir max mean 


min 


max 


mean 


min 


max 


mean 


6.25 


ur' 


37 


58 


45 






40 


51 


46 


13 


17 


14 


3.12 


io- 2 


63 


86 


71 






62 


76 


70 


19 


25 


22 


1.56 


io- 2 


83 


110 


92 






84 


102 


92 


24 


32 


27 


7.81 


io- 3 


103 


129 


112 






103 


123 


112 


29 


37 


32 


3.91 


io- 3 


118 


150 


132 






123 


144 


132 


32 


42 


36 


1.95 


io- 3 


140 


168 


151 






140 


167 


151 


35 


46 


40 


9.77 


io- 4 


159 


187 


171 






159 


191 


171 


39 


50 


43 


4.88 


io- 4 


178 


207 


190 






177 


212 


192 


43 


54 


47 


2.44 


io- 4 


196 


2311 


2111 






197 


231 


211 


47 


58 


51 


1.22 


io- 4 


215 


252 


232 






218 


205 


234 


51 


(12 




6.10 


10~ 5 


236 


27(1 


252 






239 


289 


255 


55 


(16 


59 


3.05 


Ur 5 


257 


293 


273 






257 


313 


275 


59 


70 


63 


1.53 


IO' 5 


279 


316 


294 






277 


330 


295 


62 


74 


67 


7.63 


10"" 


297 


340 


316 






295 


355 


317 


67 


77 


70 


3.81 


10"" 


320 


365 


339 






318 


378 


338 


71 


81 


74 


1.91 


ur" 


338 


384 


360 






342 


399 


361 


73 


85 


78 



Table S-10: Funnel function / 5 , S = 32 n = 64. #FES/n. 



D.3. Different line search parameters for fixed dimension n — 64. Ta- 
bles S-ll and S-12 summarize the number of function evaluations needed by 1ZV U 



RANDOM PURSUIT 
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on the ellipsoid function fa to achieve a relative accuracy of 1.91 • 10 6 for different 



parameters /i that were passed to the used Matlab line search (cf. Figure S-2 ) 



aCC. 


1E-1 


lE-2 


lE-3 


lE-4 


lE-5 


lE-6 


lE-7 


1E-8 


lE-9 


lE-10 


6.25 


10 


-3 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


3.12 


10 


-2 


3 


3 


3 


3 


3 


3 


3 


3 


3 


3 


1.56 


10 


-2 


3 


3 


3 


3 


3 


3 


3 


3 


3 


3 


7.81 


10 


-3 


57 


50 


56 


65 


53 


53 


53 


53 


73 


73 


3.91 


10 


-3 


201 


194 


219 


225 


210 


210 


210 


210 


232 


232 


1.95 


10 


-3 


360 


353 


382 


389 


373 


373 


373 


373 


395 


395 


9.77 


10 


-4 


523 


516 


546 


552 


536 


536 


536 


536 


558 


558 


4.88 


10 


-4 


685 


678 


709 


715 


7110 


700 


700 


700 


721 


721 


2.44 


10 


-4 


849 


842 


871 


878 


862 


862 


862 


862 


883 


883 


1.22 


10 


-4 


1011 


1004 


1035 


1041 


1024 


1024 


1024 


1024 


1045 


1045 


6.10 


10 


-5 


1173 


1166 


1198 


1203 


1187 


1187 


1187 


1187 


1208 


1208 


3.05 


10 


-5 


1335 


1330 


1360 


1366 


1350 


1350 


1350 


1350 


1370 


1370 


1.53 


10 


-5 


1497 


1494 


1523 


1528 


1512 


1512 


1512 


1512 


1533 


1533 


7.63 


10 


-6 


1661 


1657 


1686 


1691 


1675 


1675 


1675 


1675 


1096 


1696 


3.81 


10 


-6 


1824 


1819 


1849 


1853 


1837 


1837 


1837 


1837 


1858 


1858 


1.91 


111 




1986 


1982 


2013 


2017 


2001 


2001 


2001 


2001 


2020 


2020 



Table S-ll: Different line search parameters \i for TZV^ on ellipsoid function fa, m = 1, 
L = 1000, S — 3200, n = 64, mean of 25 runs of #ITS/n to reach to reach a relative 
accuracy of 1.91 • 10 -6 . 



acc. 


lE-1 


lE-2 


lE-3 


lE-4 


lE-5 


lE-6 


lE-7 


lE-8 


lE-9 


lE-10 


6.25 


10 


-2 


15 


10 


16 


17 


10 


16 


16 


16 


16 


16 


3.12 


10 


-2 


18 


20 


20 


21 


19 


19 


19 


19 


20 


20 


1.56 


10 


-2 


24 


25 


25 


26 


25 


25 


25 


25 


26 


26 


7.81 


10 


-3 


542 


482 


657 


815 


662 


662 


662 


662 


969 


970 


3.91 


10 


-3 


1973 


1908 


2664 


2983 


2803 


2805 


2805 


2806 


3378 


3390 


1.95 


10 


-3 


3565 


3495 


4651 


5273 


5125 


5131 


5131 


5131 


6140 


6173 


9.77 


10 


-4 


5187 


5120 


6544 


7558 


7458 


7469 


7469 


7470 


9078 


9126 


4.88 


10 


-4 


6813 


6750 


8319 


9867 


9848 


9866 


9867 


9869 


12224 


12277 


2.44 


10 


-4 


8447 


8380 


10005 


12191 


12278 


12307 


12310 


12311 


15528 


15582 


1.22 


10 


-1 


10069 


10008 


11653 


14484 


14705 


14750 


14754 


14756 


18872 


18926 


6.10 


10 


-5 


11687 


11627 


13285 


16693 


17144 


17212 


17218 


17220 


22238 


22293 


3.05 


10 


-5 


13309 


13264 


14908 


18817 


19566 


19664 


19674 


19676 


25576 


25633 


1.53 


10 


-5 


14935 


14902 


16537 


20820 


21970 


22114 


22128 


22131 


28935 


28996 


7.63 


10 


-6 


16572 


16531 


18167 


22698 


24375 


24582 


24602 


24606 


32308 


32373 


3.81 


10 


-6 


18198 


18155 


19798 


24449 


26733 


27030 


27059 


27064 


35655 


35726 


1.91 


10 


-6 


19824 


19781 


21435 


26120 


29071 


29495 


29537 


29542 


38993 


39070 



Table S-12: Different line search parameters [i for TZV^ on ellipsoid function fa, m = 1, 
L = 1000, S = 3200, n — 64, mean of 25 runs of #FES/n to reach to reach a relative 
accuracy of 1.91 • 10~ 6 . 



